Abstract: We are accompanied by huge data nowadays. Everyone produces enormous data through variety of operations, transactions and devices. Ultimately it ends up with the overhead for machines to generate and keep such huge data. The noticeable exponential growth of data becomes difficult and utmost challenging. Such data is literally large and not easy to work with for storage and retrieval[15]. This type of data can be treated with various efficient techniques for cleaning, compression and sorting of data[15]. Pre processing can be used to remove basic English stop-words from data making it compact and easy for further processing; later dimensionality reduction techniques make data more efficient and specific[16]. This data later can be clustered for better information retrieval[16]. This paper elaborates the various dimensionality reduction and clustering techniques applied on sample dataset C50test of 2500 documents giving promising results, their comparison and better approach for relevant information retrieval.
Keywords: High Dimensional Datasets, Dimensionality reduction, SVD, PCA, Clustering, K-means, Fuzzy Clustering, Hierarchical Clustering.